ID 137 336 



TM 006 136 



AOTHOl 



PUB DATS 
NOIB 



Ems pijci 

DISCBIPIOES 



IDINTIFIERS 



Hilljr Eichard K. 

ConductiDg Linear Eegression inalysis When 
obsarvationa Have ?arjing Standard irrora of 
HaasursMant. 
£lpr 77] 

8p* I Paper presented at the Innual Meeting of the 
Imerlcan idmcational Beaearch IssoGiation (61 st^ Hew 
lork. New Xork, Ipril 1977) 

HC^S1.67 Plua Pdstage* 
Iducational laaeasment}. ^Multiple Eagresaion 
Analysis I *SQhoQl Distxicta; *Standard Errer of 
Heasuremanti State Prograas; *Statiatioal Inalyaia; 
*Ta8t Interpretation 

California Isaassment Program ^ 



ABSIRICT 

In aasumption underlying aultiple linear regressiipn 
is that the standard error of meaaureaent is egual for all 
obaervationa. The literature has not addressed the procedures to be 
used when this aaauaption is violated* It was clear that data 
analyais to be perforffled on distriata in California would severely 
violate this assumption/ aince district mean acorea were to be th^ 
criterion/ and diatricta vary tremendoualy in alze- The statiatical 
techniguaa that were developed to conduct theae analyses are 
deacribad. (Author) . 



^ Documanta acquired by EEIC include many informal unpublishad * 

* materials not available from other aources. EBIC makea every effort * 

* to obtain the beat copy available. Nevertheleaa/ items of marginal * 

* reproducibility are often encountered and thia affects the quality * 

* of the microfiche and hardcopy reproductiona IBIC makes available * 

* via the IHIC Document Reproduction Service (IDES) • IDES la not * 

* reaponaible for the quality of the original document- -Eeproductiona * 

* supplied by EDES are the best that can be made from the original. * 



flAff^tstALiNiTfTUTlOP 
THil DQCUMEHT MAS iltN StPRO- 

ATiHG IT pmnrs ©f view oh oPiNrONS 

STATED QQ NOT. NCCESS&^JLy RlPRE- 
SlNT QFFtClSL NATIONAL IN|T!TUTE OF 
SDUCATION POin»ON Qft POLfCY 



Con ducting Linear Re^resstoR AaalYsis 
Wien Obat^wationi Have Varying itandardl Efrors of Heasurament 

Richard K. Hill 
Califprnla Depattmsnt of Education 



IntrDduc tloii . 

To faellitate interpretation of test results provided by che Califoifnia 
Assesamant Program, diitri^tg are pr&vtded with a statiatic called the 
cotapariaon seoire band. This band ia computed in two_stapsi first, a 
predic tad score is obtained for the district by regresaing a series of 
input variables aoileeeed about the ashool (called background factors) 
on the school mean test iaore^ and then the standard error of estimate 
is added to and subtracted from the predicted score* While such statistical 
procedure is routine in most applicationsp it is not so here. 

An assumption underlying awltiple linear regression is that the standard^. 
error of measurement is equal for all obssrvattons* But because districts 
in California vary greatly in si^e (Los Angeles City Unified School District 
tests 43,000 pupils per grade annually^ while smne rural districts teat 
just one)j it is clear that this aaaumptipn is grossly violated, and the 
consequences are extretae and observab^a* For eKample^ the multiple cor* 
relation between the background factors and third grade mean test scores 
is less than ^ 6 when computed across all districts in California/ That 
same correlation becomes almost ,8 when districts testing fewer than 10 
.pupils* *about 10 percent of the dlstricts*-are removed from the analysis. 
Thus I standard regression procedures leave one with two unsatisfactory 
alternatives: either use all districtsi both large and small r^ln^the 
regression analysis, thereby allowing the large measurement error associated 
with small districts to mask the rtlatlonships that actually exist between 
the background factors and the criterion, or arbitrarily eliminate snuiller 
districts from the-regression process. 

Iverr^tf^^tJ^e^regesslon problem were to be solved, and a predicted score 
could be fairly ieofflputed-f©r,^each district it stin would b^^ 
to add and subtract the same standard error of estimte for all districts. 
It has been observed for. some time that the mean square residual - ~- 

greater for small districts than for large ones* For this reason^ a pro- 
cedure v^ich employed the s^e standard error of estimate £or all districts 
would greatly underestlMte the error for small districts, while greatly 
overestimating the erifor for large dlatrlcts. 

A review of the Itterature revealed that this problem has not been addressed. 
While literature is replete with examples of regressions done using data 
from individuals, or using means of groups of equal slEe, no solutions 
to this particular problem are published. _ Forsyth , _for eKample, in his 
published techniques for conducting such regressions for the state of 



- 2 - 



lawa^ complately ignored the problem| stmilarly, meetings with statistl'^ 
cians £tvm departmants of education throughout the country revealed chat 
thay had a similar awareness of the problemg but also were unsure as to 
what a correct data analysis would be.. This paper describes the techniqusd 
that have been developed over the past two years in CaliforniA to conduct 
regression analyses and to cwipute the associated standajcd error of estimate 
for eech observation* 



CdTOu^ing _t_he Hultl]^ie_Linear ReRresaion Line 

As pointed out in the introduction , an assun^tion underlying multiple linear 
regresaipn analysis is that the standard error of measurement Is equal across 
all observations* The violation of this assuroption haa serious consequences 
on the regression analyaes run at the district'^ level by the California 
Assessiaent Frograsi* If the regression were to be computed using all dls^ 
triccs in California g the multiple correlation between predictors and the 
eriterioii would be around ,6# If districts testing fewer than 10 pupils 
per grade ware to be eliminated from that same analysis, the multiple 
correlation would jump to aln»st *8, Ihis occurs because of the large 
^taunts of error associated with both the predictors and the criterion in 
small districts. The large mounts of rwdon error introduced by these 
districts into the computation of the linear regression equat^ion obscures 
the relationship that exists among the predictors and criterion for the 
vast majority of school districts* Since the results of the regression 
analysis are reported to all districts i the issue never was whether to do 
something to make the regressions reflect the actual relationships more 
accurately; it was a quastion of what action would be most appropriate. 

The Development of a Solution 

The first year the analyses were ivn, the problem was handled simply by 
eliminating the small distrlccs from the analysis;, 106 dlatrictaV out o£ 
914 dlatricta throughout California^ were eliminated. This was an unsatis- 
factory solutionj however* It seemed unjustifiable for any district, no 
matter how smalli to be canpletely eliminiated« In addition 3 that solution 
still gave equal weight to small districts of 20 or 30 pupils per grade 
and the large city districts. It seemed clear that the most equitable 
solution would be to compute the regression lines making use of some weight* 
ing scheme. The choice of a weighting scheme , however, did not seem to 
be straightforward* 

The first weighting scheme tried was done by weighting all districts by 
the number of students tested. This procedure resulted in multiple cor- 
relations of ,99--a value unrealistically high. The result probably occurred 
because of the great size and deviation from the mean on both predictors 
and criterion of Los Angeles Glty, ^ 

Current Practice 

llie search for a realistic approach to conducting the regression analysis 
concluded after reconsideration of why the problem existed in the first 



- 3 - 



plaea* Since it was the differing itandard trtors ©f measurement that 
were at the heart of the problem^ it ae^ed reasonable to use that statistie 
in the weighting. The final solution, and current practice, ie to use 
the reciprocal of the standard error of the mmm as the wei^ting factor* 
This method produces a result hi^ly satisfactory on all counts^ the size 
of the multiple correlation is reasonable (around .85} i all districts are 
included y and larger districts can be assured that they had a heavier 
weight in the determination of the statewide regression line, 

Coiroiiiting the Standard Error of Estimate 
of the Predicted Score 

Whien\ a predicted score ia generated for a district by the California Assess- 
ment Programi a value is added and subtracted from that score to produce 
a band rather than a point estimate.^ The band is desired to be of a size 
such that 25 percent of the districts score below their comparison score 
band^ S0> percent score within, and 25 percent score above. 

It had been observed for several years that the size of the band should be 
dependent on district size« Larger districts have less measurement error 
in both their predictors and criterion scores and should have smaller bands. 
If ail districts were to receive bands of the same widthi most large dls» 
trictfl would score within their comparison score band , while few small 
districts would. At ©ne timej districts were divided into three groups-^ 
amallp medium and largc^and as signed a band width accordingly, ^lle 
this relatively crude procedure produced acceptable results * an investi* 
gation was conducted to see If a more sophisticated and precise way could 
bm established for deteniiining the ^ipproprlate standard errors of estimate. 

The Developffient of a Solution 

The development of an equation to calculate the standard error for schools 
of a fiKed size required first that a reasonable model of the standard 
error be posited. The first model tried assumed that the variance error 
was inversely related to the number of pupils tested in a schooli l.e.^ 
that dW^^ tf". 

: ■ .. W .;: ■ r ■ ; . ' . 

If this madeil had been correct ^ then It would have been true that a plot 
of. log ffg^ vi* log N would be linear. Such a plot was^ made by grouping 
districts of similar size and calculf.tlng the variance of the residuals. 
The relationship was not linear, and the search for an effectlvfj model 
continued*- 



It was clear that one reason fov the failure of the first model was that 
districts of large si^e do not have resi^^uals approaching zero. Any good 
inodel would have to take into account that, there Is im asymptotic approach 
of the residuals to some small but finite value as H increaaes. This line 
of!"reasoning led to the generation of a second models one which actually 
was used to report data during the 1973*74 school year. The model poalted \ 
two variances: the first, called the variance of testing error * was 



4 



considered to be tnverstly related to the number ©f pupili tested; while 
the aecDnd, called the varianee of predietiori* was aasumad to coastarit 
for all districts* As an equation, 

^t^^^^/ CD 

N ■ •.• ' ■ ■ ■ • • 

In most applications of linear regression thii two error tenas would not 
be exaiQinsd ieparately a4nc®|, in the more typical case ^ it Is reasonable 
to assume that the variance of measurement error is equal for all obser- 
vations. There is nothing t@ be gained by separating the tw© variances, 
and they are Icic coffiblned* to such a ease , the varlMce error of estimate 
would be calculated as follows* 

6^ - d" (1 - r2) (2) 

However, in this case, it seemed eiearly inappropriate to use such a pro* 
cedure* Since none of the necesaary equations for computing tha variance 
error when measurement changes is available in the literature, a stopgap 
procedure was eBq^loyed for the reporting of results of the California 
Assessment Program for the IS73*74 school year. Ihis procedure worked 
and is detailed in the succeeding paragraphs fdr tfeose who might be inter- 
ested. A more sophisticated procedure was developed subaequentlyi and the 
eKplan&tion of that procedure concludes this paper. 

The procedure for the 1973-74 ichool year aisiply involved eomputing the 
median absolute residuals (eKpresaed as a itanda^ score) for all sizes of 
districts* These median absolute residuals were plotted, aairt Figure I, 
and a curve was drawn uo estimate their values. Then two points were drawn 
and and d| were solved for. 

These results were used as a first approKlmatton. Then, 6^^ and were 

varied slightly to see if a better fit to the medians could be obtained^ 
These modified values becsn« the parameters of this error*variance equatiori 
after being multiplied by the variance of teat scores ^to eorrect for the 
fact that these were standard acores). 

F05 eKample, the iriedtan absolute residuals for second-grade pupils, district 
by district, were calculated and a line was drawn to fit theae pointrV The 
following value a were generated frcsn the lines 



/ , ' / Number of pupils Median 

/ ° ^ tested in the absolute 

second grade residual 

10 .65 

28 .40 

50 .36 

. 75 " ■ . ■ .34- 

100 ,31 



- 5 - 



Using Che valuas for N ^ 10 find N ^ 100, the following two equations were 
ganeratadi 

.95063 « + d^g 
10^ 

.21623 ^dp2 +^ 
100 

The solution is ^8,16 and ^ ^35, 
This aquation yields the following valuedi 



Number of pupils Median 

tested in the absolute / 8*16 ' 

aecond grade rasidual .67\/ *13S +-"m^^ 

10 ,65 __---^^.65 

^ 28 .40 .44 

SO ..--^,315 .36 

75 " *34 .33 

100, " ,31 .31 



Because the value^or N 28 was thought to vary from *40 by to© much* a 

variety of constants was tried. The best fit saamed to eom^ using ^ 7. 

. 2 TE 
and Op ^/ . 146. This set of parmetara yielded the following results^ 



Numbar of pupils Median 

tested in the absolute / 8.16 

second ^rade residual #67\/ .135 + M 

10 .65 .61 

28 .40 .42 

50 .36 .36 

75 .34 ,33 

100 ,31 . .31 



Sinca these calculations are in standard scoreSj the estimates of dp and 
d,jf^ ware than multiplied by the variance of mean teat soprea. The final 

values for tf^ and were 16,20 and 779i^9, respeotively. 

p TE " 

The procedure outlined above produced qyitesatiofactory results. About 
50 percent of the large districts and small dlstric^ta both were scoring 
within their comparison acore banci. However, it was desired that this 
procedure be improved upon for a variety of raasonsi it was time-consuming 
both to compute the medians and plot them, it was subject to observer bias, 
and it was, frankly, a very inelegant sdlutlon. In addition, such a pro- 
cedure would not be satisifactory to use in a situation in which there were 
not a large number of data points. Evan with th€ large number of districts 
in California (over 900 with second graders), the medians of the subgroups 
had substantial random error associated with them. 



6 



- 6 . 



The profelsm of potential observer bias seemed critical. To draw an analogyj 
when one observes a scatterplot, it is difficult to draw oni" regression 
line* Often, several lines appear as chough they could describe the data 
equally well* As a consequance, the definition of a best* fitting line has 
been presented and accepted | and computation of a regression line. is a 
straightforward procedure* This problem is very sijillar. Several lines 
can be drawn to describe the relationship between district sige and the 
a tandard errors of eatiraite* What was needed was some way to compute a 
value for the standard error directly from the data * without resorting 
CO plots* 

Going back to equation Ij it Is clear that the mean d^^ can be cmputed for 
all districts simply by squaring the residual for each district and dividing 
by the number of districts. Since dp^ is presumed to be constant for all 
districts^ It can^be computed if dj|/H can be cm^uted. 

As an estimate to this tern, the standard error of the mean (dj ) was com* 
puted for each district and then the statewide average was calculated (B 

Thus» 0 could ba estimated by 
' P • 



8^-3"-Bi (3) 
P £ ^ 



From this point, the d^ for any district could be computed by adding the 
variance error of measurement to the estimated variance of prediction. 
While this procedure seemed to be reasonable, it did not work. Although 
about 50 percent of the districts statewide scored within their compariaon 
score bandp £ew#r than 50 percent of the small districts scored within and 
more than 50 perciint of the large districts did. 



Current Practice 

The problem seemed to be that the /ff ^ was too small. And in fact, it was 
reasonable that it was too small • Only the Bteaiurement error associated 
with the criterion was being conslderedi the predictor variables certainly 
had error associated with their estimates as well (larger error for smaller 
districtSi smaller error for larger districta)« It would seem as though 
a more precise equation for estimating the variance of prediction would be 



P E 

. ■ ' ^ 2 . . ' . 

where 3d 6 is the sum of all the variances of measurement error* both for 

the criterion and the predictors. 

Of course, the straight sum is not appropriate* There is colllnearlty 
Mong the predictors. An approximation of the e^act equation is possible 
by merely considering the standard error of the mean, as in equation 3 i 
but multiplying it by an-,appropriate constant* Thus» 



Equation 5 Gurrantly Is being used by the California Assassmant Program 
in the cbmputation of comparison icore bands* The eons tant is empirically 
determined, and in different situations varies between 2 and 4# 

As a specific BnamplB^ the results of 1975*76 testing for grade 2 are 
reported. For that test, the constant used in equation 5 is 2,5, For eaoh 
district^ a predioted score was cra^uted^ and then a residual score was 
computed by subtracting the pafed^cted scdfa from the ob 
residual score was then squarO, From this value was subtracted the var* 
lance error of meaiurement for that district multiplied by 2^S« That value , 
called a "difference score was co^uted for each district. The mean 
difference score for the state was 7,5216, Thus, the estimated variance 
error of prediction for each district was 7,5216 + (2*5 * ^ g)* The number 
of districts scoring abovcj within or below their comparison score band 
as a result of th# use of this method of .calculating the estimated variance 
error is reported in Table 1. " 

The largest discrepancies from having 50 percent of the districts scoring 
within their comparison score band are for the smallest districts (54.9 
percent scored within) and the third category (45.5 percent scored within) , 
A chi^square test shows that neither of these valuss is statistically sig-* 
nificantly different at the .05 level from the desired percentage of 50, 



Table 

Number of Callfomis 
Scoring AbovSi Within or Below 
on the Grade 3, 1976 ^ Report 
Reported by Sige 



1 

School Districts . . 

Their Co^arlson Score Biknd 
of Reading Test Results^ 
of District 



Si^e of Districts 

; (Fupili per Grade) . . 

Z^' 1*20 21-50 r 51-150 151-500 ^^500 + 

Above 49 (22*8)* 52 (29*7) 49 (27*5) 45 (25.4) 31 (21.1) 

Within 118 (54.9) 83 (47*4) 81 (45*5) 82 (46*3) 74 (50.3) 

Below 48 (22*3) 40 (22.9) 48 (27.0) 50 (28.2) 42 (28.6) 



*;C0lun]tfi:pereents reported in parentheses 



